Input files
In this section, we will read in the required input files for MARVEL. The formatting requirement for each input file will be explained.
Normalised gene expression
# Matrix
path <- "Data/Gene_SingCellaR/"
file <- "matrix_normalised.mtx"
df.gene.norm <- readMM(paste(path, file, sep=""))
df.gene.norm[df.gene.norm[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 2.1168501 . 0.9731413 0.4139587 0.9874593
## [2,] 0.7056167 . . . .
## [3,] 2.1168501 3.453436 1.9462826 3.3116695 1.9749185
## [4,] 1.4112334 . 0.4865707 0.4139587 1.4811889
## [5,] 4.2337003 2.302291 4.3791359 3.3116695 1.9749185
# cell metadata
path <- "Data/Gene_SingCellaR/"
file <- "phenoData.txt"
df.gene.norm.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.norm.pheno)
## cell.id donor.id cell.type
## 1 AAACCTGAGCCGGTAA_SRR9008752 SRR9008752 Cardio day 4
## 2 AAACCTGAGTACCGGA_SRR9008752 SRR9008752 Cardio day 4
## 3 AAACCTGAGTGCTGCC_SRR9008752 SRR9008752 Cardio day 4
## 4 AAACCTGAGTGTACGG_SRR9008752 SRR9008752 Cardio day 4
## 5 AAACCTGAGTGTCCAT_SRR9008752 SRR9008752 Cardio day 4
## 6 AAACCTGCAGCCTGTG_SRR9008752 SRR9008752 Cardio day 4
# feature metadata
path <- "Data/Gene_SingCellaR/"
file <- "featureData.txt"
df.gene.norm.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.norm.feature)
## gene_short_name
## 1 MIR1302-2HG
## 2 FAM138A
## 3 OR4F5
## 4 AL627309.1
## 5 AL627309.3
## 6 AL627309.2
- For the sparse matrix, rows represent genes, columns represent cells, and the values represent normalised gene expression values that has not yet been log2-transformed.
- For the master cell metadata, each row represents one cell. Compulsory column is
cell.id and this will be used to annotate the columns of the sparse matrix. All other columns are optional.
- For the feature metadata, each row represents one gene. Compulsory column is
gene_short_name and will be used to annotate the rows of the sparse matrix. All other columns are optional.
- Here, the normalised gene expression matrix was generated using Cell Ranger and only cell passing QC was retained using SingCellaR (Wang et al., 2022).
Gene counts
# Matrix
path <- "Data/Gene_STARsolo/"
file <- "matrix_counts.mtx"
df.gene.count <- readMM(paste(path, file, sep=""))
df.gene.count[df.gene.count[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 1 2 1 3 2
## [2,] 1 2 . . .
## [3,] 1 1 . . .
## [4,] 5 6 8 16 8
## [5,] 3 12 9 10 13
# cell metadata
path <- "Data/Gene_STARsolo/"
file <- "phenoData.txt"
df.gene.count.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.count.pheno)
## cell.id
## 1 AAACCTGAGAAGATTC_SRR9008754
## 2 AAACCTGAGAAGGGTA_SRR9008754
## 3 AAACCTGAGCACACAG_SRR9008754
## 4 AAACCTGAGGGCTTGA_SRR9008754
## 5 AAACCTGAGTTGTCGT_SRR9008754
## 6 AAACCTGCAAGTCTGT_SRR9008754
# feature metadata
path <- "Data/Gene_STARsolo/"
file <- "featureData.txt"
df.gene.count.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.count.feature)
## gene_short_name
## 1 MIR1302-2HG
## 2 FAM138A
## 3 OR4F5
## 4 AL627309.1
## 5 AL627309.3
## 6 AL627309.2
- For the sparse matrix, rows represent splice junctions, columns represent cells, and the values represent UMI-collapsed, non-normalised, non-log2-transformed gene counts.
- For the cell metadata, each row represents one cell. Compulsory column is
cell.id and this will be used to annotate the columns of the sparse matrix. All other columns are optional.
- For the feature metadata, each row represents one gene. Compulsory column is
gene_short_name and will be used to annotate the rows of the sparse matrix. All other columns are optional.
- Here, the gene counts were generated using STARsolo (Kaminow et al., 2021).
Splice junction counts
# Matrix
path <- "Data/SJ_STARsolo/"
file <- "matrix_counts.mtx"
df.sj.count <- readMM(paste(path, file, sep=""))
df.sj.count[df.sj.count[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 5 4 7 14 6
## [2,] 1 1 2 1 1
## [3,] 1 . . . .
## [4,] 1 1 . . .
## [5,] 1 . . . .
# cell metadata
path <- "Data/SJ_STARsolo/"
file <- "phenoData.txt"
df.sj.count.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.sj.count.pheno)
## cell.id
## 1 AAACCTGAGAAGATTC_SRR9008754
## 2 AAACCTGAGAAGGGTA_SRR9008754
## 3 AAACCTGAGCACACAG_SRR9008754
## 4 AAACCTGAGGGCTTGA_SRR9008754
## 5 AAACCTGAGTTGTCGT_SRR9008754
## 6 AAACCTGCAAGTCTGT_SRR9008754
# feature metadata
path <- "Data/SJ_STARsolo/"
file <- "featureData.txt"
df.sj.count.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.sj.count.feature)
## coord.intron
## 1 chr1:14830:14969
## 2 chr1:14830:185490
## 3 chr1:15039:186316
## 4 chr1:16766:16853
## 5 chr1:16766:16857
## 6 chr1:16766:16875
- For the sparse matrix, rows represent genes, columns represent cells, and the values represent UMI-collapsed, non-normalised, non-log2-transformed splice junction counts.
- For the cell metadata, each row represents one cell. Compulsory column is
cell.id and this will be used to annotate the columns of the sparse matrix. All other columns are optional.
- For the feature metadata, each row represents one splice junction. Compulsory column is
coord.intron and will be used to annotate the rows of the sparse matrix. All other columns are optional.
- Here, the splice junction counts were generated using STARsolo (Kaminow et al., 2021).
tSNE/UMAP coordinates
path <- "Data/Gene_SingCellaR/"
file <- "dim_red_coordinates_iPSC_CardioDay10.txt"
df.coord <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.coord)
## cell.id x y
## 1 AAACCTGAGAAGATTC_SRR9008754 15.06473 -12.685700
## 2 AAACCTGAGAAGGGTA_SRR9008754 23.00697 10.624021
## 3 AAACCTGAGCACACAG_SRR9008754 12.97456 -18.985449
## 4 AAACCTGAGCTTTGGT_SRR9008753 -20.56841 -1.531026
## 5 AAACCTGAGGGCTTGA_SRR9008754 36.47875 -6.242368
## 6 AAACCTGAGTCGTTTG_SRR9008753 -24.82610 -18.792021
- This may be coordinates of either UMAP or tSNE. These coordinates will be used to plot the cell’s gene expression or percent spliced-in (PSI) values on the reducted dimension space.
- Compulsory columns are
cell.id, x (x-axis coordinates) and y (y-axis coordinates). All other columns are optional.
- Here, the tSNE coordinates were generated using SingCellaR (Wang et al., 2022).
Gene transfer file
path <- "Data/GTF/"
file <- "refdata-cellranger-GRCh38-3.0.0.gtf"
gtf <- as.data.frame(fread(paste(path, file, sep=""), sep="\t", header=FALSE, stringsAsFactors=FALSE))
- This should be the same file previously used for indexing the genome (for STARsolo here), computing gene expression (for Cell Ranger here).
Pre-process data
This step annotates the gene and splice junction metadata, and subsequently enables us to retain only high-quality splice junctions and genes of particular biotype, e.g., protein-coding genes.
Validate junctions
Only splice junctions whose start and end mapped to same gene are retained.
marvel <- ValidateSJ.10x(MarvelObject=marvel)
## [1] "239458 annotated and uniquely-mapped SJ identified and retained"
## [1] "957704 unannotated SJ identified and removed"
## [1] "11617 multi-mapping SJ identified and removed"
## [1] "5379 unannotated and multi-mapping SJ identified and removed"
## [1] "SJ metadata and SJ count matrix updated"
Subset CDS genes
Moving forward, we will only retain genes with coding sequence (CDS), i.e., protein-coding genes.
marvel <- FilterGenes.10x(MarvelObject=marvel,
gene.type="protein_coding"
)
## [1] "19893 of 33514 genes met filtering criteria and retained"
## [1] "227264 of 239458 SJ met filtering criteria and retained"
## [1] "Normalised gene and SJ metadata and matrix updated"
Pre-flight check
This step ensures that our data is ready for further downstream analysis including differential expression analysis, functional annotation, and candidate gene analysis.
To this end, we will have to make sure the columns of the matrices align with the cell IDs of the sample metadata and the rows of the matrices align with the feature metadata. Finally, the columns across all matrices should align with one another.
marvel <- CheckAlignment.10x(MarvelObject=marvel)
## [1] "Matching gene list in normalised and raw (count) gene matrices..."
## [1] "19893 genes found in normalised gene matrix"
## [1] "33538 genes found in raw (count) gene matrix"
## [1] "19893 overlapping genes found and retained"
## [1] "Matching cells across normalised gene, raw (count) gene, raw (count) SJ matrices..."
## [1] "32056 cells found in normalised gene matrix"
## [1] "32056 cells found in raw (count) gene matrix"
## [1] "32056 cells found in raw (count) SJ matrix"
## [1] "32056 overlapping cells found and retained"
## [1] "Checking column (sample) alignment..."
## [1] "Sample metadata and normalised gene matrix column names MATCHED"
## [1] "Normalised and raw (count) gene matrix column names MATCHED"
## [1] "Raw (Count) gene and SJ matrix column names MATCHED"
## [1] "Checking row (gene names/SJ coordinates) alignment..."
## [1] "Gene metadata and normalised gene matrix row names MATCHED"
## [1] "Normalised and raw (count) gene matrix row names MATCHED"
## [1] "SJ metadata and raw (count) SJ matrix row names MATCHED"
## [1] "32056 cells and 16071 genes consisting of 227264 splice junctions included for further analysis"
Our data is ready for downstream analysis when only MATCHED flags are reported. If any NOT MATCHED flags are reported, please double-check the input file requirements.
Save R object
Given the time taken for data pre-processing, it’ll be a good idea to save the MARVEL object at this step. The object may be loaded whenever the user begins any one of the downstream analysis.
# Save object
path <- "Data/R object/"
file <- "MARVEL.RData"
save(marvel, file=paste(path, file, sep=""))
# Load object
# path <- "Data/R Object/"
# file <- "MARVEL.RData"
# load(paste(path, file, sep=""))
Explore expression
Due to the huge number of splice junctions detected, it will not be time-effective to include all splice junctions for differential splicing analysis. Therefore, we will only include splice junctions expressed in at least a user-defined proportion of cells for differential splicing analysis. To determine this threshold, we will explore the number of cells expressing a given gene and splice junction.
We will explore the gene and splice junction expression rate in iPSCs and day-10 cardiomyocytes as we will be performing differential splicing analysis on these two cell populations.
Gene expression rate
# Define cell groups
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# Group 1 (reference)
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Group 2
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 5937
# Explore % of cells expressing genes
marvel <- PlotPctExprCells.Genes.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells=5
)
marvel$pct.cells.expr$Gene$Plot

head(marvel$pct.cells.expr$Gene$Data)
## cell.group gene_short_name n.cells.total n.cells.expr pct.cells.expr
## 2 cell.group.g1 NOC2L 11244 5590 49.72
## 5 cell.group.g1 HES4 11244 687 6.11
## 6 cell.group.g1 ISG15 11244 7172 63.79
## 7 cell.group.g1 AGRN 11244 2381 21.18
## 12 cell.group.g1 SDF4 11244 4440 39.49
## 13 cell.group.g1 C1QTNF12 11244 565 5.02
min.pct.cells option. The minimum percentage of cells expressing a given gene for that gene to be included for expression exploration analysis here.
- Here, we observe majority of genes to be expressed in ~10% of cells.
SJ expression rate
marvel <- PlotPctExprCells.SJ.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells.genes=5,
min.pct.cells.sj=5,
downsample=TRUE,
downsample.pct.sj=10
)
marvel$pct.cells.expr$SJ$Plot

head(marvel$pct.cells.expr$SJ$Data)
## cell.group coord.intron n.cells.total n.cells.expr pct.cells.expr
## 25 cell.group.g1 chr1:1629705:1630291 11244 1227 10.91
## 26 cell.group.g1 chr1:1634329:1634450 11244 832 7.40
## 32 cell.group.g1 chr1:1721712:1722707 11244 867 7.71
## 42 cell.group.g1 chr1:2400936:2402206 11244 1066 9.48
## 47 cell.group.g1 chr1:3775484:3775794 11244 1467 13.05
## 79 cell.group.g1 chr1:8868058:8870451 11244 674 5.99
min.pct.cells.genes option. The minimum percentage of cells expressing a given gene for that gene to be included for expression exploration analysis here. This threshold may be inferred from after running the PlotPctExprCells.Genes.10x function earlier.
min.pct.cells.sj option. The minimum percentage of cells expressing a given splice junction for that splice junction to be included for expression exploration analysis.
downsample option. Down-sample the number of splice junctions prior to expression exploration analysis.
downsample.pct.sj option. When downsample set to TRUE, the percentage of splice junctions to keep for expression exploration analysis.
- Here, we observe majority of splice junctions to be expressed in ~10% of cells.
Differential analysis
Differential analysis is the cornerstone of RNA-sequencing analysis. This is the first step to identify candidate genes and isoforms for downstream experimental validation.
For differential splicing analysis, MARVEL utilises the permutation-based approach for comparing the PSI values between two pseudo-bulk samples (Efremova et al., 2020).

For differential gene expression analysis, MARVEL utilises the Wilcoxon rank-sum test to compare the normalised, log2-transformed gene expression values between the single-cells that constitute the two pseudo-bulk samples (Satija et al., 2015)
Differetial splicing analysis
marvel <- CompareValues.SJ.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells.genes=10,
min.pct.cells.sj=10,
seed=1,
n.iterations=100,
downsample=TRUE,
show.progress=FALSE
)
## [1] "5937 cells from Group 1 and 5937 cells from Group 2 included"
## [1] "7322 genes expressed in cell group 1"
## [1] "9751 genes expressed in cell group 2"
## [1] "6928 genes expressed in BOTH cell group and retained"
## [1] "3784 SJ expressed in cell group 1"
## [1] "6866 SJ expressed in cell group 2"
## [1] "7087 SJ expressed in EITHER cell groups and retained"
## [1] "Total of 7087 SJ from 2765 genes included for DE analysis"
## [1] "Computing PSI for cell group 1..."
## [1] "Computing PSI for cell group 2..."
## [1] "Creating null distributions..."
## [1] "Computing P values..."
cell.group.g1 option. Cell IDs corresponding to the reference cell group.
cell.group.g2 option. Cell IDs corresponding to the non-reference cell group.
min.pct.cells.genes option. The minimum number of cells expressing a given gene for the gene to be included for DE analysis. This threshold may be determined from running the PlotPctExprCells.Genes.10x function earlier.
min.pct.cells.sj option. The minimum number of cells expressing a given splice junction for the splice junction to be included for DE analysis. This threshold may be determined from running the PlotPctExprCells.SJ.10x function earlier.
seed option. To ensure the null distributions and dowm-sampling process are always reproducible.
n.iterations option. The number of permutations to perform when building the null distribution.
downsample option. If set to TRUE, the number of cells of each cell group will be down-sampled. The number of cells to down-sample to will be based on the size of smaller cell group.
show.progress option. For brevity of this tutorial, we did not track the DE progress here. But users are advised to set this option to TRUE on their own devices.
Differetial gene analysis
marvel <- CompareValues.Genes.10x(MarvelObject=marvel,
show.progress=FALSE
)
## [1] "Performing Wilcox rank sum test..."
head(marvel$DE$SJ$Table)
## coord.intron gene_short_name n.cells.total.g1 n.cells.expr.sj.g1
## 1 chr1:1311925:1312017 INTS11 5937 346
## 2 chr1:1373903:1373999 AURKAIP1 5937 5858
## 3 chr1:1390564:1390765 CCNL2 5937 60
## 4 chr1:1402257:1405808 MRPL20 5937 2955
## 5 chr1:1405887:1406908 MRPL20 5937 1069
## 6 chr1:1629705:1630291 MIB2 5937 642
## pct.cells.expr.sj.g1 n.cells.expr.gene.g1 pct.cells.expr.gene.g1
## 1 5.83 2848 47.97
## 2 98.67 5897 99.33
## 3 1.01 1278 21.53
## 4 49.77 5845 98.45
## 5 18.01 5845 98.45
## 6 10.81 3045 51.29
## sj.count.total.g1 gene.count.total.g1 psi.g1 n.cells.total.g2
## 1 362 4078 8.88 5937
## 2 40782 49806 81.88 5937
## 3 60 1500 4.00 5937
## 4 4425 36002 12.29 5937
## 5 1199 36002 3.33 5937
## 6 708 4716 15.01 5937
## n.cells.expr.sj.g2 pct.cells.expr.sj.g2 n.cells.expr.gene.g2
## 1 972 16.37 3423
## 2 5798 97.66 5858
## 3 724 12.19 4240
## 4 3516 59.22 5859
## 5 1249 21.04 5859
## 6 526 8.86 2252
## pct.cells.expr.gene.g2 sj.count.total.g2 gene.count.total.g2 psi.g2
## 1 57.66 1067 5478 19.48
## 2 98.67 32201 38876 82.83
## 3 71.42 789 8516 9.26
## 4 98.69 5754 35035 16.42
## 5 98.69 1445 35035 4.12
## 6 37.93 547 2891 18.92
## log2fc delta pval mean.expr.gene.norm.g1.g2 n.cells.total.norm.g1
## 1 1.05163277 10.60 0 0.4152131 5937
## 2 0.01644263 0.95 0 2.1389883 5937
## 3 1.03703073 5.26 0 0.3803775 5937
## 4 0.39040352 4.13 0 1.9015034 5937
## 5 0.24177679 0.79 0 1.9015034 5937
## 6 0.31524434 3.91 0 0.3492695 5937
## n.cells.expr.gene.norm.g1 pct.cells.expr.gene.norm.g1 mean.expr.gene.norm.g1
## 1 2848 47.97 0.4209907
## 2 5897 99.33 2.5013445
## 3 1278 21.53 0.1613387
## 4 5845 98.45 2.1086064
## 5 5845 98.45 2.1086064
## 6 3045 51.29 0.4691008
## n.cells.total.norm.g2 n.cells.expr.gene.norm.g2 pct.cells.expr.gene.norm.g2
## 1 5937 3423 57.66
## 2 5937 5858 98.67
## 3 5937 4240 71.42
## 4 5937 5859 98.69
## 5 5937 5859 98.69
## 6 5937 2252 37.93
## mean.expr.gene.norm.g2 log2fc.gene.norm pval.gene.norm pval.adj.gene.norm
## 1 0.4094355 -0.01155529 4.132942e-01 4.209056e-01
## 2 1.7766321 -0.72471240 0.000000e+00 0.000000e+00
## 3 0.5994163 0.43807758 0.000000e+00 0.000000e+00
## 4 1.6944003 -0.41420606 0.000000e+00 0.000000e+00
## 5 1.6944003 -0.41420606 0.000000e+00 0.000000e+00
## 6 0.2294382 -0.23966252 4.513622e-144 9.004448e-144
show.progress option. For brevity of this tutorial, we did not track the DE progress here. But users are advised to set this option to TRUE on their own devices.
- Please note that MARVEL will only perform differential gene expression analysis on genes included for differential splicing analysis. These genes constitute the splice junctions included for analysis using the
CompareValues.SJ.10x function earlier.
Volcano plot: Splicing
marvel <- PlotDEValues.SJ.10x(MarvelObject=marvel,
pval=0.05,
delta=5,
min.gene.norm=1.0,
anno=FALSE
)
marvel$DE$SJ$VolcanoPlot$SJ$Plot

head(marvel$DE$SJ$VolcanoPlot$SJ$Data[,c("coord.intron", "gene_short_name", "sig")])
## coord.intron gene_short_name sig
## 2 chr1:1373903:1373999 AURKAIP1 n.s.
## 4 chr1:1402257:1405808 MRPL20 n.s.
## 5 chr1:1405887:1406908 MRPL20 n.s.
## 15 chr1:2189782:2193638 FAAP20 up
## 18 chr1:6197757:6199561 RPL22 n.s.
## 19 chr1:6820251:6825091 CAMTA1 n.s.
pval option. The p-value, below which, the splice junction is considered to be differentially spliced.
delta option. The absolute difference in mean PSI values between the two cell populations, above which, the splice junction is considered to be differentially spliced.
min.gene.norm option. The mean normalised gene expression value across all cells from both populations, above which, the corresponding splice junction is considered to be expressed.
Gene-splicing dynamics
MARVEL’s integrated differential gene and splicing analysis enables us to investigate how gene expression changes relative to splice junction changes when iPSCs differentiate into day-10 cardiomyocytes. The gene-splicing dynamics may be classified into four categories, namely coordinated, opposing, isoform-switching, and complex.
- Coordinated gene-splicing relationship refers to the change in mean gene expression is in the same direction with the corresponding splicing junction(s).
- Opposing gene-splicing relationship refers to the change in mean gene expression is in the opposite direction to the corresponding splicing junction(s).
- Isoform-switching refers to genes that are differentially spliced without being differentially expressed.
- Complex gene-splicing relationship refers to genes with both coordinated and opposing relationships with the corresponding splicing junctions.
Here, we will explore the gene-splicing dynamics of genes that are differentially spliced between iPSCs and day-10 cardiomyocytes. Representative examples of each dynamic will also be shown. This section also utilises the ad hoc plotting functions PlotValues.PCA.CellGroup.10x, PlotValues.PCA.Gene.10x, PlotValues.PCA.PSI.10x for plotting cell groups, PSI, and gene expression, respectively, on the reduced dimension space.
Please not users are required to run the CompareValues.SJ.10x and CompareValues.Genes.10x functions (refer to “Differential splicing analysis” section) before proceeding with this section.
Assign dynamics
marvel <- IsoSwitch.10x(MarvelObject=marvel,
pval.sj=0.05,
delta.sj=5,
min.gene.norm=1.0,
pval.adj.gene=0.05,
log2fc.gene=0.5
)
marvel$SJ.Gene.Cor$Proportion$Plot

marvel$SJ.Gene.Cor$Proportion$Table
## sj.gene.cor freq pct
## 2 Coordinated 101 18.738404
## 4 Opposing 83 15.398887
## 3 Iso-Switch 317 58.812616
## 1 Complex 38 7.050093
head(marvel$SJ.Gene.Cor$Data[,c("coord.intron", "gene_short_name", "cor.complete")])
## coord.intron gene_short_name cor.complete
## 1 chr1:2189782:2193638 FAAP20 Iso-Switch
## 2 chr1:11054886:11054961 SRM Opposing
## 3 chr1:11674817:11675081 MAD2L2 Opposing
## 4 chr1:20652529:20652620 DDOST Iso-Switch
## 5 chr1:22086867:22091427 CDC42 Coordinated
## 6 chr1:23313703:23318482 HNRNPR Coordinated
pval.sj option. The p-value, below which, the splice junction is considered to be differentially spliced.
delta.sj option. The absolute difference in mean PSI values between the two cell populations, above which, the splice junction is considered to be differentially spliced.
min.gene.norm option. The mean normalised gene expression value across all cells from both populations, above which, the corresponding splice junction is considered to be expressed.
pval.adj.gene option. The p-value, below which, the gene is considered to be differentially expressed.
log2fc.gene option. The absolute log2 fold change in gene expression, above which, the gene is considered to be differentially expressed.
- Here, we observe majority of differentially spliced genes underwent isoform-switching followed by coordinated, opposing, and then complex gene expression changes relative to splicing changes.
- Note that majority of differentially spliced genes may not be inferred directly from differential gene expression analysis alone. For example, only in coordinated gene-splicing relationship that the gene and splicing changes between iPSCs and day-10 cardiomyocytes are in the same direction. But this category only constitute around one-fifth of gene-splicing relationships.
Coordinated
# Define cell groups
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# iPSC
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Cardio day 10
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 5937
# Save into list
cell.group.list <- list("iPSC"=cell.ids.1,
"Cardio d10"=cell.ids.2
)
# Plot cell groups
marvel <- PlotValues.PCA.CellGroup.10x(MarvelObject=marvel,
cell.group.list=cell.group.list,
legendtitle="Cell group",
type="tsne"
)
## [1] "All cells defined with coordinates found"
plot_group <- marvel$adhocPlot$PCA$CellGroup
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="VIM",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr10:17235390:17235845",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_group, plot_gene, plot_sj, nrow=1)

- For VIM, both gene expression (middle) and the corresponding splice junction (right) shown here are up-regulated in day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas splicing rate represented by PSI scale.
Opposing
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="UQCRH",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr1:46310317:46316551",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_group, plot_gene, plot_sj, nrow=1)

- For UQRCH, the gene expression (middle) is up-regulated in iPSCs relative to day-10 cardiomyocytes.
- On the other hand, the corresponding splice junction (right) shown here is up-regulated in day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas splicing rate represented by PSI scale.
Iso-switch
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="RBM39",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr20:35740888:35741940",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr20:35739018:35740823",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj_2 <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_gene, plot_sj_1, plot_sj_2, nrow=1)

- For RBM39, the gene (left) is not differentially expressed between iPSCs and day-10 cardiomyocytes.
- On the other hand, both corresponding splice junctions (middle and right) are up-regulated in iPSCs relative to day-10 cardiomyocytes.
- Note gene expression represented by log2(expression) scale whereas splicing rate represented by PSI scale.
Complex
# Gene 1
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="TPM1",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr15:63064143:63065895",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr15:63044153:63056984",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_sj_2 <- marvel$adhocPlot$PCA$PSI
# Gene 2
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="TPM2",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr9:35684808:35685268",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr9:35684551:35685063",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_sj_2 <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot.1_gene, plot.1_sj_1, plot.1_sj_2,
plot.2_gene, plot.2_sj_1, plot.2_sj_2,
nrow=2
)

- For both TPM1 and TPM2 genes, their gene expressions (left) are up-regulated in day-10 cardiomyocytes relative to iPSCs.
- Similar to gene expression profile, one of the splice junctions (middle) for these genes are up-regulated in day-10 cardiomyocytes relative to iPSCs.
- On the other hand, the other splice junction (right) for these genes are down-regulated in day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas splicing rate represented by PSI scale.
Gene ontology analysis
Gene ontology analysis or pathway enrichment analysis categorises the differentially spliced genes between iPSCs and day-10 cardiomyocytes into biological pathways. This may identify sets of genes with similar function or belong to similar biological pathways that are concurrently spliced.
marvel <- BioPathways.10x(MarvelObject=marvel,
pval=0.05,
delta=5,
min.gene.norm=1.0,
method.adjust="fdr",
species="human"
)
## [1] "539 unique genes identified for GO analysis"
head(marvel$DE$BioPathways$Table)
## ID Description GeneRatio
## GO:0006413 GO:0006413 translational initiation 81/514
## GO:0006402 GO:0006402 mRNA catabolic process 104/514
## GO:0006613 GO:0006613 cotranslational protein targeting to membrane 64/514
## GO:0019080 GO:0019080 viral gene expression 74/514
## GO:0006119 GO:0006119 oxidative phosphorylation 48/514
## GO:0022613 GO:0022613 ribonucleoprotein complex biogenesis 69/514
## BgRatio enrichment pvalue p.adjust qvalue
## GO:0006413 192/18866 15.484618 7.365368e-76 1.521578e-72 1.308689e-72
## GO:0006402 376/18866 10.152248 7.667309e-76 1.521578e-72 1.308689e-72
## GO:0006613 109/18866 21.551137 4.346461e-72 4.312776e-69 3.709361e-69
## GO:0019080 195/18866 13.928804 4.244284e-65 2.105695e-62 1.811081e-62
## GO:0006119 149/18866 11.824198 2.338011e-38 5.799729e-36 4.988270e-36
## GO:0022613 482/18866 5.254347 2.570165e-30 5.667214e-28 4.874296e-28
## geneID
## GO:0006413 RPL11/YTHDF2/RPS8/RPL5/RPS27/TPR/CDC123/RPS24/RPLP2/EIF3F/RPS13/EIF3M/POLR2G/RPS3/RPS25/EIF4B/RPS26/RPL6/RPLP0/RPS29/EIF5/EIF2AK4/EIF3J/RPL4/RPLP1/RPS2/RPS15A/RPL13/RPL23/RPL19/EIF1/RPL38/RPL36/RPS28/EIF3G/RPL18A/UBA52/EIF3K/RPS16/RPS19/RPL18/RPL13A/RPS11/RPS9/RPL28/RPS5/RPS7/RPS27A/EIF5B/RPL31/RPL37A/EIF2S2/EIF3L/RPL3/RPSA/RPL14/RPL24/RPL35A/RPL9/RPL34/RPS3A/RPL37/RPS23/PAIP2/NPM1/RPS18/RPL10A/RPS12/HSPB1/RPS20/RPL7/RPL30/EIF3E/EIF3H/RPL8/RPS6/RPL12/RPL7A/RPS4X/RPL10/RPS4Y1
## GO:0006402 HNRNPR/RPL11/YTHDF2/PSMB2/THRAP3/YBX1/RPS8/SERBP1/RPL5/PSMA5/CSDE1/RBM8A/PSMD4/PSMB4/RPS27/HNRNPU/VIM/RPS24/RPLP2/RPS13/PSMC3/POLR2G/RPS3/RPS25/YBX3/RPS26/RPL6/RPLP0/HNRNPC/PSME2/RPS29/PSMA3/RPL4/RPLP1/PSMA4/RPS2/RPS15A/FUS/RPL13/PSMB6/PSMB3/RPL23/RPL19/PSMC5/DDX5/RPL38/CIRBP/RPL36/RPS28/HNRNPM/RPL18A/UBA52/PSMD8/RPS16/RPS19/RPL18/RPL13A/RPS11/RPS9/RPL28/RPS5/RPS7/RPS27A/RPL31/SSB/RPL37A/XRN2/YWHAB/RPL3/LSM3/RPSA/RPL14/RPL24/DHX36/FXR1/RPL35A/RPL9/RPL34/LSM6/RPS3A/RPL37/RPS23/NPM1/RPS18/RPL10A/SYNCRIP/RPS12/PSMB1/HSPB1/RPS20/RPL7/RPL30/YWHAZ/EIF3E/RPL8/RPS6/PSMB7/RPL12/SET/RPL7A/NBDY/RPS4X/RPL10/RPS4Y1
## GO:0006613 RPL11/RPS8/RPL5/RPS27/RPS24/RPLP2/RPS13/RPS3/RPS25/RPS26/RPL6/RPLP0/RPS29/RPL4/RPLP1/RPS2/RPS15A/RPL13/RPL23/RPL19/RPL38/RPL36/RPS28/RPL18A/UBA52/RPS16/RPS19/RPL18/RPL13A/RPS11/RPS9/RPL28/RPS5/RPS7/RPS27A/RPL31/RPL37A/RPL3/RPSA/RPL14/RPL24/SSR3/SEC62/RPL35A/RPL9/SRP72/RPL34/RPS3A/RPL37/RPS23/RPS18/RPL10A/RPS12/RPS20/RPL7/RPL30/RPL8/RPS6/SEC61B/RPL12/RPL7A/RPS4X/RPL10/RPS4Y1
## GO:0019080 RPL11/RPS8/RPL5/RPS27/TPR/NUCKS1/RPS24/IFITM3/RPLP2/EIF3F/RPS13/PSMC3/POLR2G/RPS3/RPS25/PCBP2/RPS26/RPL6/RPLP0/RPS29/EIF2AK4/RPL4/RPLP1/RPS2/RPS15A/RPL13/RPL23/RPL19/RPL38/RPL36/RPS28/EIF3G/SMARCA4/RPL18A/UBA52/RPS16/RPS19/RPL18/RPL13A/RPS11/RPS9/RPL28/RPS5/RPS7/RPS27A/RPL31/SSB/RPL37A/EIF3L/POLR2F/RPL3/RPSA/RPL14/RPL24/RPL35A/RPL9/RPL34/RPS3A/RPL37/RPS23/RPS18/RPL10A/RPS12/POLR2J/RPS20/RPL7/RPL30/RPL8/RPS6/RPL12/RPL7A/RPS4X/RPL10/RPS4Y1
## GO:0006119 UQCRH/NDUFS2/ATP5F1C/NDUFS3/NDUFS8/ATP5MC2/ATP5F1B/NDUFA12/DNAJC15/NDUFB1/COX5A/NDUFB10/UQCRC2/NDUFAB1/COX4I1/ATP5PD/NDUFV2/ATP5F1A/ATP5F1D/NDUFS7/NDUFA3/ATP5MC3/NDUFB3/ATP5F1E/ATP5PF/NDUFA6/UQCRC1/NDUFB4/ATP5ME/NDUFC1/NDUFS6/NDUFS4/COX7C/UQCRQ/COX7A2/NDUFA4/CYCS/ATP5MF/NDUFA5/NDUFB2/COX6C/NDUFB9/CYC1/NDUFB6/STOML2/NDUFA8/NDUFB11/NDUFA1
## GO:0022613 RPL11/RPS8/RPL5/RPS27/SNRPE/RPS24/NPM3/EIF3F/EIF3M/TRMT112/MRPL11/EIF4B/PA2G4/SNRPF/RPL6/RPLP0/GTF3A/SRSF5/EIF5/EIF3J/RPS2/RSL1D1/RPL38/SNRPD1/RPS28/EIF3G/EIF3K/RPS16/FBL/RPS19/SNRPD2/NOP53/PIH1D1/RPL13A/RPS9/RPS5/RPS7/CEBPZ/SNRPG/DDX18/SF3B1/SNRPB/NOP56/XRN2/EIF2S2/EIF3L/DDX17/RPL3/RPSA/RPL14/RPL24/RPL35A/LSM6/NSA2/MRPL22/NPM1/RPL26L1/SNRPC/RPL10A/PRKDC/RPL7/EIF3E/EIF3H/PSIP1/RPS6/RPL12/RPL7A/PIN4/RPL10
## Count
## GO:0006413 81
## GO:0006402 104
## GO:0006613 64
## GO:0019080 74
## GO:0006119 48
## GO:0022613 69
pval option. The p-value, below which, the splice junction is considered to be differentially spliced.
delta option. The absolute difference in mean PSI values between the two cell populations, above which, the splice junction is considered to be differentially spliced.
min.gene.norm option. The mean normalised gene expression value across all cells from both populations, above which, the corresponding splice junction is considered to be expressed.
species option. MARVEL also supports GO analysis of "mouse".
- Alternative to using the
pval, delta, and min.gene.norm options for specifying differentially spliced genes for GO analysis, users may input their own custom genes for GO analysis using the custom.genes option.
# Plot pathways related to cardiomyocyte development
go.terms <- c("ATP metabolic process",
"regulation of stem cell differentiation",
"Wnt signaling pathway, planar cell polarity pathway",
"non-canonical Wnt signaling pathway",
"muscle filament sliding",
"actin-myosin filament sliding",
"regulation of ATPase activity",
"regulation of fibroblast proliferation",
"actomyosin structure organization",
"myofibril assembly",
"negative regulation of calcium-mediated signaling"
)
marvel <- BioPathways.Plot.10x(MarvelObject=marvel,
go.terms,
y.label.size=8,
offset=0.5
)
marvel$DE$BioPathways$Plot

Candidate gene analysis
Users may have specific candidate genes of interest to investigate in the single-cell RNA-sequencing dataset. These candidate genes may have been identified from initial differential gene expression analysis of the same dataset or from an earlier discovery dataset, e.g. bulk RNA-sequencing dataset, or that reported from the literature.
Here, we will illustrate several related functions for investigating the gene-splicing relationship between a selected gene and its corresponding splice junctions. This may reveal candidate isoforms for downstream functional validation.
We have chosen TPM2 for this illustration as we have seen earlier that this gene has a complex relationship with its splice junctions (see “Gene-splicing dynamics” section of this tutorial). Therefore, this gene will be a good example for demonstrating the intricate relationship betweeh splice junction usage and gene expression profile.
Define cell groups
Our earlier differential analysis included only iPSCs and day-10 cardiomyocytes. Here, we will include all cell types in this dataset, namely iPSCs, and day-2, -4, and -10 cardiomyocytes.
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# iPSC
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Cardio day 2
index <- which(sample.metadata$cell.type=="Cardio day 2")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 6240
# Cardio day 4
index <- which(sample.metadata$cell.type=="Cardio day 4")
cell.ids.3 <- sample.metadata[index, "cell.id"]
length(cell.ids.3)
## [1] 8635
# Cardio day 10
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.4 <- sample.metadata[index, "cell.id"]
length(cell.ids.4)
## [1] 5937
# Save into list
cell.group.list <- list("iPSC"=cell.ids.1,
"day2"=cell.ids.2,
"day4"=cell.ids.3,
"day10"=cell.ids.4
)
Gene expression profiling
marvel <- adhocGene.TabulateExpression.Gene.10x(MarvelObject=marvel,
cell.group.list=cell.group.list,
gene_short_name="TPM2",
min.pct.cells=10,
downsample=TRUE
)
## [1] "Downsampling to 5937 cells per group"
marvel$adhocGene$Expression$Gene$Plot

marvel$adhocGene$Expression$Gene$Table
## group mean.expr pct.cells.expr
## 1 iPSC 1.533693 93.39734
## 2 day2 2.064926 99.52838
## 3 day4 2.889787 99.89894
## 4 day10 2.919375 99.89894
min.pct.cells option. The minimum percentage of cells the gene is expressed in for the gene to be considered expressed and included for analysis here.
downsample option. If set to TRUE, the number of cells of each cell group will be down-sampled. The number of cells to down-sample to will be based on the size of smallest cell group.
- We observe progressive increased in TPM2 gene expression from iPSCs to early cardiomyocytes and then late cardiomyocytes.
SJ usage profiling
marvel <- adhocGene.TabulateExpression.PSI.10x(MarvelObject=marvel,
min.pct.cells=10
)
marvel$adhocGene$Expression$PSI$Plot

head(marvel$adhocGene$Expression$PSI$Table)
## group figure.column coord.intron n.cells.total n.cells.expr.sj
## 1 iPSC SJ-1 chr9:35682164:35684245 5937 5209
## 2 iPSC SJ-2 chr9:35684316:35684487 5937 4860
## 3 iPSC SJ-3 chr9:35684551:35684731 5937 2754
## 4 iPSC SJ-4 chr9:35684808:35685268 5937 1013
## 5 iPSC SJ-5 chr9:35685340:35685433 5937 919
## 6 iPSC SJ-9 chr9:35684551:35685063 5937 1073
## pct.cells.expr.sj sj.count.total gene.count.total psi
## 1 87.74 15904 21778 73.03
## 2 81.86 12730 21778 58.45
## 3 46.39 4083 21778 18.75
## 4 17.06 1152 21778 5.29
## 5 15.48 1039 21778 4.77
## 6 18.07 1229 21778 5.64
min.pct.cells option. The minimum percentage of cells the splice junction is expressed in for the splice junction to be considered expressed and included for analysis here.
- We observe 10 splice junctions to be expressed in at least one cell population.
- SJ-1 is more highly expressed in iPSCs and early cardiomyocytes relative to late cardiomyocytes.
- However, there is virtually no difference in SJ-2 expression across all cell populations.
- On the other hand, SJ-3 is more highly expressed in cardiomyocytes relative to iPSCs.
Gene-splicing relationship
Let’s investigate how the top 3 most highly-expressed splice junctions’ usage (SJ-1, -2, and -3) changes relative to TPM2 gene expression changes for all possible pair-wise comparison between iPSCs, and day-2, -4, and -10 cardiomyocytes.
# Perform differential gene expression analysis for all pair-wise group comparison
marvel <- adhocGene.DE.Gene.10x(MarvelObject=marvel)
# Perform differential splicing expression analysis for all pair-wise group comparison
marvel <- adhocGene.DE.PSI.10x(MarvelObject=marvel)
# Retrieve SJ DE results table
results <- marvel$adhocGene$DE$PSI$Data
# SJ-1
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-1"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35682164:35684245"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.1 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# SJ-2
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-2"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684316:35684487"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.2 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# SJ-3
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-3"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684551:35684731"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.3 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# Arrange and view plots
grid.arrange(plot.1, plot.2, plot.3,
nrow=3
)

log2fc.gene option. The absolute log2 fold change, above which, the gene is considered differentially expressed.
delta.sj option. The absolute difference in mean PSI values between two cell populations, above which, the splice junction is considered differentially spliced.
- SJ-1 is more highly expressed in iPSCs and early cardiomyocytes relative to late cardiomyocytes.
- However, there is virtually no difference in SJ-2 expression across all cell populations.
- On the other hand, SJ-3 is more highly expressed in cardiomyocytes relative to iPSCs.
Locate SJ position
Locating the position of the splice junctions on the isoforms corresponding to TPM2 will reveal the specific isoforms expressed in our dataset for this.
# SJ-1
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-1"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35682164:35684245"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
## [1] "Retrieving transcripts from GTF file..."
## [1] "12 transcripts identified"
marvel$adhocGene$SJPosition$Plot

# SJ-2
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-2"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684316:35684487"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
## [1] "Retrieving transcripts from GTF file..."
## [1] "12 transcripts identified"
marvel$adhocGene$SJPosition$Plot

# SJ-3
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-3"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684551:35684731"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
## [1] "Retrieving transcripts from GTF file..."
## [1] "12 transcripts identified"
marvel$adhocGene$SJPosition$Plot

rescale_introns option. If set to TRUE, the introns will be minimise in order to emphasise the exons and splice junctions.
show.protein.coding.only option. If set to TRUE, only protein-coding isoforms will be displayed.
- We may infer that ENST00000329305 is the dominant isoform expressed in this dataset because it contains the 3 top highly expressed splice junctions.
- Due to differential splice junction usage across the different cell population, it is conceivable that isoforms other than ENST00000329305 are also expressed in this dataset, and are differentially expressed across the different cell population.